Proposed
Formula Based on Study of Correlation between Hub and Spoke Architecture and
Bus Architecture in Data Warehouse Architecture, Based on Distinct Parameters
Rajdeep Chowdhury*, Bikramjit Pal and Saikat
Ghosh
Department of Computer
Application, JIS College of Engineering Block A, Phase III, Kalyani, Nadia-741235,
West Bengal, India
ABSTRACT:
Data warehousing has
evolved with every passing decade and it has come a long way from its inception
and the modern era has made it an adequate part of pre-existing analytical
methodologies. In the present status, data warehousing has evolved into a
system which is capable of furnishing key performance metrics to high-level
management, ensuring capability of analytical strength to middle-level
management and aligning to the ability of providing corrective data to-and-fro back
to low-level based on the basis of information derived from the analytical system. The data
warehouse market is currently triggered by business-driven solutions focussing
on domain specific challenges and its allied histrionics that have conjured up
the very basic nuances of data warehousing. The present business idologies of
the global village have cropped up innovative and tougher challenges for the
data warehouse designers and architects to ensemble a bigger and much better
innovation. Although, there are numerous methods available in the global market
to cope up to this stiff challenges, but the evolutions have not made much of
an impact in the global arena and the competitive market has prompted to
venture into the unseen horizons over and over again. Data warehouses are
designed to facilitate reporting and analysis. The said characteristic of the
data warehouse mainly focuses on the data storage and acts much like a buffer
to absorb continuous stock of data which gets processed via numerous iterative
steps to evolve into information, awaited by all tiers of an organization for
various decision-making processes. For
over a decade, discussions and even controversies have lingered about which of
the existing architectures is the best data warehouse architecture. The two
giants of the data warehousing field, Bill Inmon and Ralph Kimball, are at
the heart of disagreement. Inmon advocates the Hub & Spoke architecture
(for example, the Corporate Information Factory), while Kimball promotes the
data mart Bus architecture with conformed dimensions. There are other
architecture alternatives, but these two options are fundamentally different
approaches, and each has strong advocates via implementation.
KEYWORDS: Data Warehouse, Hub
and Spoke Architecture, Bus Architecture, Business Intelligence, Federated and
Data Mart, Repetition Constant, Propagation Constant.
INTRODUCTION:
Data warehouse is a well-established
repository or storage place of an organization's electronically stored data.
[2] [3] Data warehouses are designed to facilitate comprehensive reporting and
minute / thorough analysis. [3]
Data warehouse
architectures that exist and are widely used in the industry are mainly
fragmented into five types, namely; [4]
a) Independent data mart
b) Data mart Bus architecture
c)
Hub and Spoke architecture
d)
Centralized data warehouse (no dependent data marts)
e)
Federated
A web-based survey has been conducted to collect data
regarding the performance of each of the above architecture. The survey
included questions about the respondent, the respondents company, the
companys data warehouse and the success of the data warehouse architecture.
The positions of the respondents were distributed relatively evenly among data
warehouse managers, data warehouse staff members, IS managers and independent
consultants/system integrators.
CONCEPTUAL LITERATURE REVIEW:
In order to fully understand the impact of data warehouse in
an industrial scenario, it is important to first take a back-seat and ensure
what data warehouse is by having feasibility study and evaluating various
scenarios, and why it should be implemented and from what perspective it will
be accountable? Lastly, the conceptual literature review will focus on how the
distinct architectures are going to be correlated via the proposed formula in
the modern trends of an industrial scenario, keeping in mind of both the
subject world and the usage world. The implementation of the proposed formula
has been instantiated.
SUCCESS OF THE ARCHITECTURE:
Four measures were used to assess the success of the
architectures: (1) information quality, (2) system quality, (3) individual
impacts, and, (4) organizational impacts. The questions used
a seven-point scale, with the higher score indicating a more successful architecture.
Figure below shows the average scores for the measures across the
architectures.
Independent data marts scored the lowest on all
measures. This finding confirms the conventional wisdom that independent data
marts are a poor architectural solution.
Next lowest on all measures was the federated
architecture. Firms sometimes have desperate decision-support platforms
resulting from mergers and acquisitions, and they may choose a federated
approach, at least in the shorter run. The findings suggest that the federated
architecture is not an optimal long-term solution.
|
Independent Data Mart [1] |
Bus Architecture [2] |
Hub and Spoke Architecture [3] |
Centralized (No Dependent Data Marts) [4] |
Federated [5] |
Information Quality |
4.42 |
5.16 |
5.35 |
5.23 |
4.73 |
System Quality |
4.59 |
5.60 |
5.56 |
5.41 |
4.69 |
Individual Impact |
5.08 |
5.80 |
5.62 |
5.64 |
5.15 |
Organizational Impact |
4.66 |
5.34 |
5.24 |
5.30 |
4.77 |
Correlation
between HUB and Spoke Architecture and Bus Architecture
|
Hub and Spoke Architecture [x] |
Bus Architecture [y] |
x*y |
x2 |
y2 |
Information Quality |
5.35 |
5.16 |
27.606 |
28.6225 |
26.6256 |
System Quality |
5.56 |
5.60 |
31.136 |
30.9136 |
31.36 |
Individual Impact |
5.62 |
5.80 |
32.596 |
31.5844 |
33.64 |
Organizational Impact |
5.24 |
5.34 |
27.9816 |
27.4576 |
28.5156 |
Total |
21.77 |
21.90 |
119.3196 |
118.5781 |
120.1412 |
The
Coefficient of correlation is defined as:-
Σ xy ((Σx Σy) /N)
R =
(i)
√
(Σx2 ((Σx)2/N)) (Σy2
((Σy)2 /N))
Value
of R calculated using the data given above is, R = 0.8562
Correlation
between all five Data Marts implementation via case study
|
Independent Data Mart [1] |
Bus Architecture [2] |
Hub and Spoke Architecture [3] |
Centralized (No Dependent Data Marts) [4] |
Federated [5] |
Information Quality |
4.42 |
5.16 |
5.35 |
5.23 |
4.73 |
System Quality |
4.59 |
5.60 |
5.56 |
5.41 |
4.69 |
Individual Impact |
5.08 |
5.80 |
5.62 |
5.64 |
5.15 |
Organizational Impact |
4.66 |
5.34 |
5.24 |
5.30 |
4.77 |
PROPOSED FORMULA: -
Σ xicyjd
((Σxic. Σyjd) /N)
√ (Σxic2
((Σxic)2/N)) (Σyjd2 ((Σyjd)2/N))
Where, c stands for the
Repetition Constant for i, and , d
stands for the Propagation Constant for j
·
c = 1 to n-1 (where n is the number of variables)
·
d = 2+m to n (where m increases till c n-1)
r = f / nC2
(where n is the
number of variables)
WORKING:
As we have 5 different
types of Data Marts, then value of nC2
will be:-
5C2 ΰ
10
For 10 different combinations, we
will have the following:-
f12, f13, f14,
f15, f23, f24, f25, f34,
f35, f45
Values
of the above stated combinations would adhere to:-
f12 = 0.8569, f13
= 0.5917, f14 = 0.9375, f15 = 0.9377, f23 =
0.8562,
f24 = 0.9639, f25
= 0.7011, f34 = 0.8334, f35 = 0.5446, f45 =
0.8613
Thus,
the value of f is:
f = (f12 + f13 + f14 +
f15 + f23 + f24 + f25 + f34 +
f35 + f45) f = 8.0843
Therefore,
the final value of r is:
As, r = f / nC2 r = 0.80843
CONCLUSION:
At the end of the evolution, we have come to the
conclusion that the data warehouse and the data mart have a co-existing
relationship by adhering to the user analysis and reporting methodology,
designed and conceptualized from user perspective and which has very prominent
practical application in the real world.
The result shows that the hub & spoke architecture
and bus architecture are very closely correlated in terms of all the distinct
parameters. This finding helps in explaining why these competing architectures
have survived over time and periodic turmoil. They are equally successful for
their intended purposes and are seemingly adhering to individuals. In terms of information
quality, system quality, individual impact and organizational impact, no single
architecture is dominant and does not pose much of superiority over the
competing one.
Similarly, we can find the correlation among other
architectures, which will show their existence in the industry. In some ways,
the architectures have evolved over time and become more similar. Even the
development methodologies (for example; top down methodology for the hub &
spoke architecture and centralized architecture and life cycle or bottom up
methodology for the bus architecture) have evolved and become more similar.
REFERENCES:
1.
Data
warehousing and OLAP: A research-oriented bibliography by A. Mendelzon, C. Hurtado and D. Lemire
2.
B. Pal, Comparison of Data Warehouse Architecture Based on Data Model,
International Journal of Information Technology & Knowledge Management,
July-December 2010, Volume 2, Number 2, pp. 303-304
3.
R. Chowdhury, B. Pal, Proposed Hybrid Data Warehouse Architecture Based
on Data Model, International Journal of Computer Science & Communication,
July-December 2010, Volume 1, Number 2, pp. 211-213
4.
http://en.wikipedia.org/wiki/Data_warehouse
5.
http://db.stanford.edu/pub/papers/warehouse-research
Received on 03.04.2011
Modified on 12.04.2011
Accepted
on 17.04.2011
©
A&V Publication all right reserved
Research J. Science and
Tech. 3(3): May-June. 2011: 154-157